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AGTCCCAGACGGGCTTTTCCCAGAGAGCTAAAAGAGAAGGGCCAGAGA ATG TCGTCCCAG 

CCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACTCCTATGGCAGCTGGTAC 

ATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGGAAGTGCCCTCGTGCCAC 

ACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGCTGTCAATCCTTGTGCTG 

CTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTGACTGTGTGCGTGGCAGG 

CCCGGCCTGCCCAGCCCTGTGGATTTCTTGGCTGGGGACAGGCCCCGGGCAGTGCCTGCT 

GCTGTTTTCATGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTG 

CCCTTCCTGACTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGG 

GGCTGGAAGATACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGT 

GCCACGGCTGGCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTT 

GGGGTCCAGGTCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTAC 

TCCCTGCTGGCCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCT 

GTGCAGCTGGTGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGC 

AGCTACTCTGAGGAATATCTGAGGAACCTCCTTTGCAGGAAGAAGCTGGGAAGCAGCTAC 

CACACCTCCAAGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTAC 

ACTCCACAGCCAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGG 

ACGGCCATTTACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAG 

GTGAGGGCAGGGGTCACCACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTC 

TCCGAGGACAAGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTG 

TGCTACATCTCAGCCTTGGTCTTGTCCT GCTTACTCACCTTCCTGGTCCTGATGCGCTCA 

CTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGT 

CCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGGTTCAGT 

GCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTG 

GGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTG 

CTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATC 

CTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTG 

ACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTG 

GGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATCCACCTT 

GGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTAC 

ACGTACCGAAACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTC 

TGCTCCCTGCTCCTGCAAGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGAC 

AGCCTCAGACCAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATG 

GCCAAGGGAGCTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACG 

CTGCTGCACAACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGT 

GCCCAGCCC TGA GGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCC 

TGCCTACCATCCTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCA 

GCAGGTCCTCCGGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAG 

GGCTCTGCTCCACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAAAAACTG 

GTGGGTTAGGGCCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTC 

CCTACCCTGGCTCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACT 

CCAGCCCAGCTCCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCT 

CACCCCCTCAGCGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGTCCTCTGGC 

CTGCAGGGCAGCeCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGA 

GAGCCAGATATTTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTC 

CCTGCAATAAACTTGTTCCTGAGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
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MSSQPAGNQTSPGATEDYSYGSWYI DEPQGGEELQPEGEVPSCHTSIPPGLYHACLASLS 
ILVLLLLAMLVRRRQLWPDCVRGRPGLPSPVDFLAGDRPRAVPAAVFMVLLSSLCLLLPD 
EDALPFLTLASAPSQDGKTEAPRGAWKILGLFYYAALYYPLAACATAGHTAAHLLGSTLS 
WAHLGVQVWQRAECPQVPKIYKYYSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGSK 
GLQSSYSEEYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCIYTPQPGFHLPLKLVLSA 
TLTGTAIYQVALLLLVGVVPTIQKVRAGVTTDVSYLLAGFGIVLSEDKQEVVELVKHHLW 
ALEVCYISALVLSCLLTFLVLMRSLVTHRTNLRALHRGAALDLSPLHRSPHPSRQAIFCW 
MSFSAYQTAFICLGLLVQQI I FFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLA 
LAVILQNMAAHWVFLETHDGHPQLTNRRVLYAATFLLFPLNVLVGAMVATWRVLLSALYN 
AIHLGQMDLSLLPPRAATLDPGYYTYRNFLKIEVSQSHPAMTAFCSLLLQAQSLLPRTMA 
APQDSLRPGEEDEGMQLLQTKDSMAKGARPGASRGRARWGLAYTLLHNPTLQVFRKTALL 
GANGAQP 

Important features of the protein: 
Signal peptide: 

None 

Transmembrane domain: 

54-69 

102-119 

148-166 

207-222 

301-320 

364-380 

431-451 

474-489 

560-535 

Motif file: 

Motif name: N-glycosylation site. 
8-12 

Motif name: N-myristoylation site. 

50-56 

176-182 

241-247 

317-323 

34U347 

525-531 

627-633 

631-637 

640-646 

661-667 

Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 
364-375 



Motif name: ATP/GTP-binding site motif A (P-loop). 
132-140 



FIG..2 
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PRO XXXXXXXXXXXXXXX (Length = 1 5 amino acids) 

Comparison Protein XXXXXYYYYYYY (Length = 1 2 amino acids) 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
of the PRO polypeptide) = 
5 divided by 15 = 33.3% 

V v ' 



PRO XXXXXXXXXX (Length = 1 0 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length = 15 amino acids) 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
of the PRO polypeptide) = 
5 divided by 10 = 50% 

V v ' 



FIG B= 3B 



PRO-DNA NNNNNNNNNNNNNN (Length = 14 nucleotides) 

Comparison DNA NNNNNNLLLLLLLLLL (Length = 1 6 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences 
as determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) = 
6 divided by 14 = 42.9% 

V v 

FIG a =,3C 



PRO-DNA NNNNNNNNNNNN (Length = 12 nucleotides) 

Comparison DNA NNNNLLLVV (Length = 9 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences 
as determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) = 
4 divided by 12 = 33.3% 

v _ Y _ / 

FIGa = 3D 
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FOG. _4 A 



/* 



* C-C increased from 12 to 15 

* Z is average of EQ 

* B is average of ND 

* match with stop is _M; stop-stop = 0; J C'oker) match = 0 
7 

#define _M -8 /* value of a match with a stop */ 
int _day[26][26] = { 

/* ABCDEFGHIJKLMNOPQRSTUVWXYZV 
/* A 7 { 2, 0,-2, 0, 0,-4, 1 ,-1 ,-1 , 0,-1 ,-2,-1 , 0,_M, 1 , 0,-2, 1,1,0, 0,-6, 0,-3, 0}, 
/* B 7 { 0, 3,-4, 3, 2,-5, 0, 1 ,-2, 0, 0,-3,-2, 2,_M,-1 , 1 , 0, 0, 0, 0,-2,-5, 0,-3, 1}, 
/* C 7 {-2,-4,15,-5,-5,-4,-3,-3,-2, 0,-5,-6,-5,-4,_M,-3,-5,-4, 0,-2, 0,-2,-8, 0, 0,-5}, 
/* D 7 { 0, 3,-5, 4, 3,-6, 1 , 1 ,-2, 0, 0,-4,-3, 2,_M,-1 , 2,-1 , 0, 0, 0,-2,-7, 0,-4, 2}, 
I* E 7 { 0, 2,-5, 3, 4,-5, 0, 1,-2, 0, 0,-3,-2, 1 ,_M,-1 , 2,-1 , 0, 0, 0,-2,-7, 0,-4, 3}, 
r F 7 {-4,-5,-4,-6,-5, 9,-5,-2, 1 , 0,-5, 2, 0,-4,_M,-5,-5,-4,-3,-3, 0,-1 , 0, 0, 7,-5}, 
I* G 7 { 1 , 0,-3, 1 , 0,-5, 5,-2,-3, 0,-2,-4,-3, 0,_M,-1 ,-1 ,-3, 1 , 0, 0,-1 ,-7, 0,-5, 0}, 
/* H 7 {-1 , 1 ,-3, 1 , 1 ,-2,-2, 6,-2, 0, 0,-2,-2, 2,_M, 0, 3, 2,-1 ,-1 , 0,-2,-3, 0, 0, 2}, 
r I 7 {-1 ,-2,-2,-2,-2, 1 ,-3,-2, 5, 0,-2, 2, 2,-2,_M ,-2,-2,-2,-1 , 0, 0, 4,-5, 0,-1 ,-2}, 
I* J 7 { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
r K 7 {-1 , 0,-5, 0, 0,-5,-2, 0,-2, 0, 5,-3, 0, 1 ,_M,-1 , 1 , 3, 0, 0, 0,-2,-3, 0,-4, 0}, 
r L 7 {-2,-3,-6,-4,-3, 2,-4,-2, 2, 0,-3, 6, 4,-3,_M,-3,-2,-3,-3,-1 , 0, 2,-2, 0,-1 ,-2}, 
r M 7 {-1 ,-2,-5,-3,-2, 0,-3,-2, 2, 0, 0, 4, 6,-2,_M,-2,-1 , 0,-2,-1 , 0, 2,-4, 0,-2,-1 }, 
/* N 7 { 0, 2,-4, 2, 1 ,-4, 0, 2,-2, 0, 1 ,-3,-2, 2,_M,-1 , 1 , 0, 1 , 0, 0,-2,-4, 0,-2, 1}, 
/* O 7 {_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,_M,0,_M,_M,_M,_M, 



/* P7 
/*Q V 
/* R V 
/*S V 

y* TV 



/* 

r 
/* 
/* 
r 
/* 

}; 



u v 
v v 
w v 
xv 

Y V 

zv 



_M ,_M ,_M ,_M ,_M ,_M ,_M}, 
1 ,-1 ,-3,-1 ,-1 ,-5,-1 , 0,-2, 0,-1 ,-3,-2,-1 ,_M, 6, 0, 0, 1 , 0, 0,-1 ,-6, 0,-5, 0}, 

0, 1,-5, 2, 2,-5,-1, 3,-2, 0, 1,-2,-1, 1,_M, 0, 4, 1,-1,-1, 0,-2,-5, 0,-4, 3}, 
-2, 0,-4,-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0,_M, 0, 1,6, 0,-1, 0,-2, 2, 0,-4, 0}, 

1 , 0, 0, 0, 0,-3, 1 ,-1 ,-1 , 0, 0,-3,-2, 1 ,_M, 1 ,-1 , 0, 2, 1 , 0,-1 ,-2, 0,-3, 0}, 
1, 0,-2, 0, 0,-3, 0,-1, 0, 0, 0,-1,-1, 0,_M, 0,-1,-1, 1, 3, 0, 0,-5, 0,-3, 0}, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
0,-2,-2,-2,-2,-1, -_1 ,-2, 4, 0,-2, 2, 2,-2,_M,-1 ,-2,-2,-1 , 0, 0, 4,-6, 0,-2,-2}, 
-6,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3,-2,-4,-4,_M,-6,-5, 2,-2,-5, 0,-6,17, 0, 0,-6}, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
-3,-3, 0,-4,-4, 7,-5, 0,-1, 0,-4,-1 ,-2,-2,_M,-5,-4,-4,-3,-3, 0,-2, 0, 0,10,-4}, 
0, 1 ,-5, 2, 3,-5, 0, 2,-2, 0, 0,-2,-1 , 1 ,_M, 0, 3, 0, 0, 0, 0,-2,-6, 0,-4, 4} 
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I* 
7 

#include <stdio.h> 
#include <ctype.h> 



FIG.-4B 



#define 


MAXJMP 


16 


#define 


MAXGAP 


24 


#define 


JMPS 


1024 


#define 


MX 


4 


#define 


DM AT 


3 


#define 


DMIS 


0 


#define 


DINSO 


8 


#define 


DINS1 


1 


#define 


PINSO 


8 


#define 


PINS1 


4 



/* max jumps in a diag */ 

/* don't continue to penalize gaps larger than this */ 
/* max jmps in an path */ 

/* save if there's at least MX-1 bases since last jmp */ 

/* value of matching bases 7 

/* penalty for mismatched bases 7 

/* penalty for a gap 7 

/* penalty per base */ 

/* penalty for a gap */ 

/* penalty per residue */ 



struct jmp { 



}; 

struct diag { 



short 

unsigned short 



}; 

char 

char 

char 

char 

int 

int 

int 

int 

int 

int 

int 

int 

int 

long 

struct diag 
struct path 



n[MAXJMP]; 
x[MAXJMP]; 



/* size of jmp (neg for dely) */ 
/* base no. of jmp in seq x */ 
/* limits seq to2 A 16-1 7 



int 
long 
short 
struct 



}; 

struct path { 



int 

short 
int 



score; 
offset; 
ijmp; 
jmpjp; 



spc; 

n[JMPS]; 
x[JMPS]; 



*ofile; 

*namex[2]; 

*prog; 

*seqx[2]; 

dmax; 

dmaxO; 

dna; 

endgaps; 

gapx, gapy; 

lenO, Ien1; 

ngapx, ngapy; 

smax; 

*xbm; 

offset; 

*dx; 

pp[2]; 



/* score at last jmp */ 
/* offset of prev block 7 
/* current jmp index 7 
/* list of jmps 7 



/* number of leading spaces 7 

/* size of jmp (gap) 7 

/* loc of jmp (last elem before gap) 7 



/* output file name 7 

/* seq names: getseqs() 7 

/* prog name for err msgs 7 

/* seqs: getseqsQ 7 

/* best diag: nw() 7 

/* final diag 7 

/* set if dna: main() 7 

/* set if penalizing end gaps 7 

r total gaps in seqs 7 

/* seq lens 7 

/* total size of gaps 7 

/* max score: nw() 7 

/* bitmap for matching 7 

/* current offset in jmp file 7 

/* holds diagonals 7 

/* holds path for seqs 7 



char 
char 



*calloc(), *malloc(), *index(), *strcpy(); 
*getseq(), *g_calioc(); 
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r Needleman-Wunsch alignment program __ _ _ 

* usage: progs filel file2 

* where filel and file2 are two dna or two protein sequences. 

* The sequences can be in upper- or lower-case an may contain ambiguity 

* Any lines beginning with '>' or '<' are ignored 

* Max file length is 65535 (limited by unsigned short x in the jmp struct) 

* A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 

* Output is in the file "align. out" 
* 

* The program may create a tmp file in /tmp to hold info about traceback. 

* Original version developed under BSD 4.3 on a vax 8650 
V 

#include "nw.h" 
#include "day.h" 

static _dbval[26] = { 

1,14,2,13,0,0,4,11,0,0,12,0,3,15,0,0,0,5,6,8,8,7,9,0,10,0 

}; 

static _pbval[26] = { 

1, 2|(1«( , D'-'A , ))|(1«( , N , - , A , )), 4, 8, 16, 32, 64, 
128, 256, OxFFFFFFF, 1«10, 1«11, 1«12, 1«13, 1«14, 
1«15, 1«16, 1«17, 1«18, 1«19, 1«20, 1«21, 1«22, 
1«23, 1«24, 1«25|(1«( , E , - , A , ))|(1«( , Q , - , A , )) 

}; 

main(ac, av) main 
int ac; 
char *av[]; 



{ 



prog = av[0]; 
if (ac != 3) { 

fprintf(stderr, "usage: %s filel file2\n", prog); 

fprintf(stderr, "where filel and file2 are two dna or two protein sequencesAn"); 
fprintf(stderr,"The sequences can be in upper- or lower-case\n"); 
fprintf(stderr,"Any lines beginning with ';' or '<' are ignored\n"); 
fprintf(stderr, "Output is in the file \"align.out\"\n"); 
exit(1); 

} 

namex[0] = av[1]; 
namex[1] = av[2]; 

seqx[0] = getseq(namex[0], &len0); 
seqx[1] = getseq(namex[1], &len1); 
xbm = (dna)? _dbval : _pbval; 

endgaps = 0; /* 1 to penalize endgaps */ 

ofile = "align.out"; /* output file 7 

nw(); /* fill in the matrix, get the possible jmps */ 

readjmps(); /* get the actual jmps 7 
printQ; /* print stats, alignment */ 



cleanup(O); /* unlink any tmp files */ 

} 
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/* do the alignment, return best score: main() 

* dna: values in Fitch and Smith, PNAS, 80, 1382-1386, 1983 

* pro: PAM 250 values 

* When scores are equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 

V 

nw() 
{ 



FDG.-4D 



nw 



char 
int 
inf 
inf 
int 
int 

register 
register 
register 
register 



*px, *py; 
*ndely, *dely; 
ndelx, delx; 
*tmp; 
mis; 

insO, ins1; 
id; 

U; 

*col0, *col1; 
xx, yy; 



/* seqs and ptrs 7 
/* keep track of dely */ 
/* keep track of delx */ 
/* for swapping rowO, rowl */ 
/* score for each type */ 
/* insertion penalties */ 
/* diagonal index */ 
/* jmp index */ 
/* score for curr, last row */ 
/* index into seqs */ 



dx = (struct diag *)g_calloc("to get diags", Ien0+len1+1, sizeof(struct diag)); 

ndely = (int *)g_calloc("to get ndely", Ien1+1 , sizeof(int)); 
dely = (int *)g_calloc("to get dely", Ien1+1 , sizeof(int)); 
colO = (int *)g_calloc("to get colO", Ien1+1, sizeof(int)); 
coM = (int *)g_calloc("to get coM", Ien1+1 , sizeof(int)); 
insO = (dna)? DINS0 : PINS0; 
ins1 =(dna)? DINS1 : PINS1; 

smax = -10000; 
if (endgaps) { 

for (col0[0] = dely[0] = -insO, yy = 1 ; yy <= Ien1 ; yy++) { 
col0[yy] = delyfyy] = col0[yy-1 ] - ins1 ; 
ndely[yy] = yy; 

col0[0] = 0; /* Waterman Bull Math Biol 84 V 

} 

else 

for (yy = 1 ; yy <= Ien1 ; yy++) 
dely[yy] = -insO; 

/* fill in match matrix 

7 

for (px = seqx[0], xx = 1 ; xx <= lenO; px++, xx++) { 
/* initialize first entry in col 

V 

if (endgaps) { 

if (xx == 1) 

col1[0] = delx = -(ins0+ins1); 

else 

col1[0] = delx = col0[0] - ins1 ; 
ndelx = xx; 

} 

else { 

col1[0] = 0; 
delx = -insO; 
ndelx = 0; 

} 
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...nw 

for (py = seqx[1], yy = 1 ; yy <= Ien1 ; py++, yy++) { 
mis = col0[yy-1]; 
if (dna) 

mis += (xbmrpx-'A'l&xbmrpy-'A'])? DMAT : DMIS; 

else 

mis += ^day^px-'A'JPpy-TV]; 

/* update penalty for del in x seq; 

* favor new del over ongong del 

* ignore MAXGAP if weighting endgaps 

7 

if (endgaps || ndely[yy] < MAXGAP) { 
if (col0[yy] - insO >= dely[yy]) { 

dely[yy] = col0[yy] - (ins0+ins1); 

ndely[yy] = 1 ; 
} else { 

delyfyy] -= ins1; 

ndely[yy]++; 

} 

} else { 

if (col0[yy] - (ins0+ins1) >= dely[yy]) { 
dely[yy] = col0[yy] - (ins0+ins1); 
ndely[yy] = 1 ; 

} else 

ndely[yy]++; 

} 

/* update penalty for del in y seq; 

* favor new del over ongong del 

7 

if (endgaps || ndelx < MAXGAP) { 
if (col1[yy-1] - insO >= delx) { 

delx = col1[yy-1] - (ins0+ins1); 

ndelx = 1 ; 
} else { 

delx -= ins1; 

ndelx++; 

} 

} else { 

if (col1[yy-1] - (ins0+ins1) >= delx) { 
delx = coll [yy-1 ] - (ins0+ins1 ); 
ndelx = 1 ; 

} else 

ndelx++; 

} 

/* pick the maximum score; we're favoring 

* mis over any del and delx over dely 
*/ 



FIG.-4E 
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...nw 

id = xx - yy + Ien1 - 1; 

if (mis >= delx && mis >= dely[yy]) 

col1[yy] = mis; 
else if (delx >= dely[yy]) { 

col1[yy] = delx; 

ij = dx[id].ijmp; 

if (dx[id].jp.n[0] && (!dna || (ndelx >= MAXJMP 
&& xx > dx[id].jp.x[ij]+MX) || mis > dx[id].score+DINSO)) { 
dx[id].ijmp++; 
if ( ++ y >= MAXJMP) { 
writejmps(id); 
ij = dx[id].ijmp = 0; 
dx[id], offset = offset; 

offset += sizeof (struct jmp) + sizeof (offset); 

} 

} 

dx[id].jp.n[ij] = ndelx; 
dx[id].jp.x[ij] = xx; 
dx[idj. score = delx; 

} 

else { 

col1[yy] = dely[yy]; 
ij = dx[id].ijmp; 

if (dx[id].jp.n[0] && (!dna || (ndely[yy] >= MAXJMP 

&& xx > dx[id].jp.x[ij]+MX) || mis > dx[id].score+DINSO)) { 
dx[id].ijmp++; 
if (++jj >= MAXJMP) { 
writejmps(id); 
ij = dx[id].ijmp = 0; 
dx[id].offset = offset; 

offset += sizeof (struct jmp) + sizeof (offset); 

} 

} 

dx[id].jp.n[ij] = -ndely[yy]; 
dx[id].jp.x[ij] = xx; 
dx[id]. score = dely[yy]; 

} 

if (xx == lenO && yy < Ien1 ) { 
/* last col 
7 

if (endgaps) 

col1[yy] -= ins0+ins1*(len1-yy); 
if (colt[yy] > smax) { 

smax = col1[yy]; 

dmax = id; 



FIG..4F-1 
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} 

} 

} 

if (endgaps && xx < lenO) 

col1[yy-1] -= ins0+ins1*(len0-xx); 
if (col1[yy-1] > smax) { 

smax = col1[yy-1]; 

dmax = id; 

} 

tmp = colO; colO = coh ; coll = tmp; 

} 

(void) free((char *)ndely); 
(void) free((char *)dely); 
(void) free((char *)colO); 
(void) free((char *)col1); 
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* printQ -- only routine visible outside this module 

* 

* static: 

* getmat() - trace back best path, count matches: print() 

* pr_align() - print alignment of described in array p[]: print() 

* dumpblock() - dump a block of lines with numbers, stars: pr__align() 

* nums() -- put out a number line: dumpblock() 

* putline() -- put out a line (name, [num], seq, [num]): dumpblock() 

* stars() - -put a line of stars: dumpblock() 

* stripname() -- strip any path and prefix from a seqname 
7 

#include "nw.h" 
#defineSPC 3 

#define P_LINE 256 /* maximum output line 7 

#define P_SPC 3 /* space between name or num and seq */ 

extern _day[26][26]; 

int olen; /* set output line length 7 

FILE *fx; /* output file 7 

print() print 

int Ix, ly, firstgap, lastgap; /* overlap 7 

if ((fx = fopen(ofile, "w")) == 0) { 

fprintf(stderr,"%s: can't write %s\n", prog, ofile); 
cleanup(1); 

fprintf(fx, "<first sequence: %s (length = %d)\n", namex[0], lenO); 
fprintf(fx, "<second sequence: %s (length = %d)\n", namex[1], Ien1); 
olen = 60; 
Ix = lenO; 
ly = Ien1; 

firstgap = lastgap = 0; 

if (dmax < Ien1 - 1 ) { /* leading gap in x 7 

pp[0]:spc = firstgap = lent - dmax - 1 ; 
^ ly -= pp[0].spc; 

else if (dmax > Ien1 - 1) { /* leading gap in y 7 
pp[1].spc = firstgap = dmax - (Ien1 - 1); 
Ix -= pp[1].spc; 

} 

if (dmaxO < lenO - 1) { /* trailing gap in x 7 
lastgap = lenO - dmaxO -1 ; 
Ix -= lastgap; 

} 

else if (dmaxO > lenO - 1) {/* trailing gap in y 7 
lastgap = dmaxO - (lenO - 1); 
ly -= lastgap; 

} 

getmat(lx, ly, firstgap, lastgap); 
pr_align(); 
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/ FIG.-4H 

* trace back the best path, count matches 

7 

static 

getmat(lx, ly, firstgap, lastgap) 

int Ix, ly; /* "core" (minus endgaps) */ 

int firstgap, lastgap; /* leading trailing overlap */ 



getmat 



{ 



int 

char 

double 

register 

register char 



nm, iO, i1, sizO, siz1; 

outx[32]; 

pet; 

nO, n1; 

*p0, *p1; 



/* get total matches, score 
7 

iO = i 1 = sizO = siz1 = 0; 



+ pp[1].spc; 
+ pp[0].spc; 
spc + 1 ; 
spc + 1 ; 



pO = seqx ; 
p1 = seqx : 
nO = pp[1". 
n1 = pp[0\ 

nm = 0; 

while ( *p0 && *p1 ) { 
if (sizO) { 

pi++; 
n1++; 
sizO--; 

} 

else if (siz1) { 
p0++; 
n0++; 
siz1--; 

} 

else { 

if (xbmPpO-'A'l&xbmrpl-'A']) 

nm++; 
if (n0++ == pp[0].x[iO]) 

sizO = pp[0].n[iO++]; 
if (n1++ = pp[1].x[i1]) 

siz1 = pp[1].n[i1++]; 

P0++; 
p1++; 



} 



} 



/* pet homology: 

* if penalizing endgaps, base is the shorter seq 

* else, knock off overhangs and take shorter core 
7 

if (endgaps) 

Ix = (lenO < Ien1)? lenO : Ien1; 

else 

Ix = (Ix < ly)? Ix : ly; 
pct= 100.*(double)nm/(double)lx; 
fprintf(fx, "\n"); 

fprintf(fx, "<%d match%s in an overlap of %d: %.2f percent similarity\n", 
nm, (nm == 1)? "" : "es", Ix, pet); 
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fprintf(fx, "<gaps in first sequence: %d", gapx); ...gefmat 
if (gapx) { 

(void) sprintf(outx, " (%d %s%s)", 

ngapx, (dna)? ,, base":"residue ,, l (ngapx == 1)? ,,,, :"s"); 

fprintf(fx J "%s", outx); 

fprintf(fx, ", gaps in second sequence: %d", gapy); 

if (gapy) { 

(void) sprintf(outx, " (%d %s%s)", 

ngapy, (dna)? "base": n residue", (ngapy == 1)? "Vs"); 
fprintf(fx, M %s", outx); 

} 

if (dna) 

fprintf(fx, 

"\n<score: %d (match = %d, mismatch = %d, gap penalty = %d + %d per 
base)\n", smax, DM AT, DMIS, DINSO, DINS1); 

else 

fprintf(fx, 

"\n<score: %d (Dayhoff PAM 250 matrix, gap penalty = %d + %d per 
residue)\n", smax, PINSO, PINS1); 
if (endgaps) 

fprintf(fx, 

"<endgaps penalized, left endgap: %d %s%s, right endgap: %d %s%s\n", 
firstgap, (dna)? "base" : "residue", (firstgap == 1)? : "s", 
lastgap, (dna)? "base" : "residue", (lastgap = 1)? "" : "s"); 



else 



fprintf(fx, "<endgaps not penalized\n n ); 



static 


nm; 


/* matches in core -- for checking */ 


static 


Imax; 


/* lengths of stripped file names */ 


static 


ij[2]; 


/* jmp index for a path */ 


static 


nc[2]; 


/* number at start of current line */ 


static 


ni[2]; 


/* current elem number -- for gapping */ 


static 


siz[2]; 




static char 


*ps[2]; 


/* ptr to current element 7 


static char 


*po[2]; 


/* ptr to next output char slot */ 
/* output line 7 


static char 


out[2][P LINE]; 


static char 


star[P LINE]; 


r set by starsQ 7 



r 

* print alignment of described in struct path pp[] 

7 

static 

pr_align() pr_align 
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int nn; /* char count */ 

int more; 
register i; 

for (i = 0, Imax = 0; i < 2; i++) { 
nn = stripname(namex[i]); 
if (nn > Imax) 

Imax = nn; 

nc[i] = 1 ; 
ni[i] = 1; 
sizfi] = ij[i] = 0; 
ps[i] = seqx[i]; 
po[i] = out[i]; 
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for (nn = nm = 0, more = 1 ; more; ) { 
for (i = more = 0; i < 2; i++) { 
/* 

* do we have more of this sequence? 
7 

if (!*ps[i]) 

continue; 

more++; 

if (pp[i].spc) { /* leading space */ 
*po[i]++="; 
pp[i].spc~; 

} 

else if (siz[i]) { /* in a gap */ 
*po[i]++ = l - , ; 
siz[i]--; 

} 

else { /* we're putting a seq element 

*/ 

*po[i] = *ps[i]; 
if (islower(*ps[i])) 

*ps[i] = toupper(*ps[i]); 
po[i]++; 
ps[i]++; 

r 

* are we at next gap for this seq? 
7 

if (ni[i] == pp[i].x[ij[i]]) { 
I* 

* we need to merge all gaps 

* at this location 
7 

siz[i] = pp[i].n[ij[i]++]; 
while (ni[i] == pp[i].x[ij[i]]) 

siz[i] += pp[i].n[ij[i]++]; 

ni[i]++; 



if (++nn == olen J| Imore && nn) { 
dumpblock(); 
for (i = 0; i < 2; i++) 
po[i] = out[i]; 

nn = 0; 



r 

* dump a block of lines, including numbers, stars: pr_align() 
7 

static 

dumpblock() 
{ 

register i; 

for (i = 0; i < 2; i++) 
*po[i]~ = '\0'; 

F/G-4J 
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(void) putc('\n\ fx); 
for (i = 0; i < 2; i++) { 

if (*out[i] && (*out[i] != " || *(po[i]) != 1 ")) { 
if (i == 0) 

nums(i); 
if (i ==0&& *out[1]) 

stars(); 
putline(i); 

if (i ==0&& *out[1]) 

fprintf(fx, star); 
if (i ==1) 

nums(i); 



...dumpblock 



} 



} 



} 

/* 

* put out a number line: dumpblock() 
*/ 

static 

nums(ix) 

int ix; /* index in out[] holding seq line */ 



{ 



char nline[P_LINE]; 
register i, j; 

register char *pn, *px, *py; 

for (pn = nline, i = 0; i < lmax+P_SPC; i++, pn++) 
*pn = 1 '; 

for (i = nc[ix], py = out[ix]; *py; py++, pn++) { 

"f (*PY == " II *PV =='-') 
*pn = "; 

gIsg { 

if (i%10 == 0 || (i == 1 && nc[ix] != 1)) { 
j = (i < 0)? -i : i; 
for (px=pn; j; j /= 10, px~) 

*px=j%10 + '0'; 
if (i < 0) 

*px = '-'; 

} 

else 

*pn = ' 



} 



} 



} 

/* 



'pn = ■\0"; 
nc[ix] = i; 

for (pn = nline; *pn; pn++) 

(void) putc(*pn, fx); 
(void) putcCXn', fx); 



put out a line (name, [num], seq, [num]): dumpblock() 

7 

static 

putline(ix) 

int ix; 

{ 



FIG..4K 



nums 
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int i; 
register char *px; 

for (px = namex[ix], i = 0; *px && *px != ':'; px++, i++) 

(void) putc(*px, fx); 
for (; i < lmax+P_SPC; i++) 

(void) putc(' ', fx); 

/* these count from 1 : 

* ni[] is current element (from 1) 

* nc[] is number at start of current line 
*/ 

for (px = out[ix]; *px; px++) 

(void) putc(*px&0x7F, fx); 
(void) putcCXn', fx); 

} 

/* 

* put a line of stars (seqs always in out[0], out[1]): dumpblock() 

7 

static 

stars() 
{ 

int i; 

register char *p0, *p1 , cx, *px; 

if (!*out[0] || (*out[0] =="&& *(po[0]) == ' ') || 
!*out[1] || (*out[1] =="&& *(po[1]) == ' ')) 
return; 
px = star; 

for (i = lmax+P_SPC; i; i--) 
*px++ = ' '; 

for (pO = out[0], p1 = out[1 ]; *p0 && *p1 ; p0++, p1 ++) { 
if (isalpha(*pO) && isalpha(*p1)) { 

if (xbm[*pO- , A']&xbm[*p1- , A']) { 
- cx = '*'; 
nm++; 

} 

else if (!dna && .dayFpO-'ATpl-'A'] > 0) 
cx = '.'; 

else 

cx = "; 

} 

else 

cx = ' '; 
*px++ = cx; 

} 

*px++ = W; 
*px = '\0'; 

} 
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r 

* strip path or prefix from pn, return len: pr_align() 

7 

static 

stripname(pn) 

char *pn; /* file name (may be path) */ 

{ 

register char *px, *py; 
py = 0; 

for (px = pn; *px; px++) 
if (* px == 7') 

py = px + 1 ; 

if (py) 

(void) strcpy(pn, py); 
return(strlen(pn)); 

} 
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/* 

* cleanupO cleanup any tmp file 

* getseq() - read in seq, set dna, len, maxlen 

* g_cal!oc() calloc() with error checkin 

* readjmps() ~ get the good jmps, from tmp file if necessary 

* writejmps() - write a filled array of jmps to a tmp file: nw() 
7 

#include "nw.h" 
#include <sys/file.h> 

char *jname = 7tmp/homgXXXXXX"; /* tmp file for jmps */ 

FILE *fj; 

int cleanupO; /* cleanup tmp file 7 

long lseek(); 

/* 

* remove any tmp file if we blow 

7 

cleanup(i) cleanup 
int i; 

{ 

if (fj) 

(void) unlink(jname); 

exit(i); 

r* 

* read, return ptr to seq, set dna, len, maxlen 

* skip lines starting with '<', or V 

* seq in upper or lower case 
7 

char * 

getseq(file, len) getseq 
char *file; /* file name */ 

int *len; /* seq len 7 



{ 



char line[1024], *pseq; 

register char *px, *py; 

int natgc, tlen; 

FILE *fp; 

if ((fp = fopen(file,V)) == 0) { 

fprintf(stderr,"%s: can't read %s\n", prog, file); 
exif(1); 

} 

tlen = natgc = 0; 

while (fgets(line, 1024, fp)) { 

if (*Hne == ';' || *line = V || *line == V) 

continue; 
for (px = line; *px != '\n'; px++) 

if (isupper(*px) || islower(*px)) 
tlen++; 

if ((pseq = malloc((unsigned)(tlen+6))) == 0) { 

fprintf(stderr, H %s: malloc() failed to get %d bytes for %s\n", prog, tlen+6, 

file); 

exit(1); 

} 

pseq[0] = pseq[1] = pseq[2] = pseq[3] = '\0*; 
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py = pseq + 4; 
*len = tlen; 
rewind(fp); 

while (fgets(line, 1024, fp)) { 

if (*line == ';' || *line == '<' || *line == V) 

continue; 
for (px = line; *px != W; px++) { 
if (isupper(*px)) 

*py++ = *px; 
else if (islower(*px)) 

*py++ = toupper(*px); 
if (index("ATGCU n ,*(py-1))) 
natgc++; 

} ' 

*py++ = '\0'; 

*py = '\0'; 
(void) fclose(fp); 
dna = natgc > (tlen/3); 
return(pseq+4); 

} 

char * 

g_calloc(msg, nx, sz) 

char *msg; /* program, calling routine */ 

int nx, sz; /* number and size of elements 7 



.getseq 



{ 



char 



*px, *calloc(); 



gcalloc 



if ((px = calloc((unsigned)nx, (unsigned)sz)) == 0) { 
if(*msg){ 

fprintf(stderr, "%s: g_calloc() failed %s (n=%d, sz=%d)\n", prog, msg, 
nx, sz); 
. exit(1); 

} } 
return(px); 



/* 

* get final jmps from dx[] or tmp file, set pp[], reset dmax: main() 

*/ 

readjmps() 
{ 



readjmps 



int fd = -1 ; 

int siz, iO, i1; 

register i, j, xx; 

if (fj) { 

(void) fclose(fj); 

if ((fd = openQname, 0_RDONLY, 0)) < 0) { 

fprintf(stderr, "%s: can't open() %s\n", prog, jname); 
cleanup(1); 

} } 

for (i = iO = i1 = 0, dmaxO = dmax, xx = lenO; ; i++) { 
while (1){ 

for (j = dx[dmax].ijmp; j >= 0 && dx[dmax].jp.x[j] >= xx; j--) 



FIG.-40 



Inventor: Pennicaetal. 

Docket No.: 1 1669.0163USU1 

Title: . NOVEL STRA6 POLYPEPTIDES 

Serial No.: 09/759,056 

Sheet 21 of 3 5 

J 



...readjmps 

- if (j < 0 && dx[dmax].offset && fj) { 

(void) lseek(fd, dx[dmax]. offset, 0); 

(void) read(fd, (char *)&dx[dmax].jp, sizeof(struct jmp)); 

(void) read(fd, (char *)&dx[dmaxj. offset, 

sizeof(dx[dmax]. offset)); 

dx[dmax].ijmp = MAXJMP-1 ; 

} 

else 

break; 

} 

if (i >= JMPS) { 

fprintf(stderr, "%s: too many gaps in alignmeht\n", prog); 
cleanup(1); 

} 

if a >= o) { 

siz = dx[dmax].jp.n[j]; 
xx = dx[dmax].jp.x[j]; 
dmax += siz; 

if (siz < 0) { /* gap in second seq */ 

pp[1].n[i1] = -siz; 
xx += siz; 

/* id = xx - yy + Ien1 - 1 
7 

pp[1].x[i1] = xx - dmax + Ien1 - 1; 
gapy++; 
ngapy -= siz; 
/* ignore MAXGAP when doing endgaps */ 

siz = (-siz < MAXGAP || endgaps)? -siz : MAXGAP; 
i1++; 

} 

else if (siz > 0) { /* gap in first seq 7 
pp[0].n[i0] = siz; 
pp[0].x[i0] = xx; 
gapx++; 
ngapx += siz; 
/* ignore MAXGAP when doing endgaps */ 

siz = (siz < MAXGAP || endgaps)? siz : MAXGAP; 
i0++; 

} 

} 

else 

break; 

} 
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I* reverse the order of jmps 
7 

for 0 = 0, i0~; j < iO; iO--) { 

i = PP[0].nU]; pp[0].n[j] = pp[0].n[iO]; pp[0].n[iO] = i; 
i = pp[0].xG]; pp[0].x[j] = pp[0].x[iO]; pp[0].x[iO] = i; 

} 

forQ = 0, i1--; j<i1;j++, i1--) { 

i = pp[1].n[j]; pp[1].n[j] = pp[1].n[M]; pp[1].n[i1] = i; 
i = pp[1].x[j]; pp[1].x[j] = pp[1].x[M]; pp[1].x[i1] = i; 

} 

if (fd >= 0) 

(void) close(fd); 

if (fj) { 

(void) unlinkQ'name); 
fj = 0; 
offset = 0; 

}} 
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r 

* write a filled jmp struct offset of the prev one (if any): nw() 
*/ 

writejmps(ix) writejmps 
int ix; 

{ 

char *mktemp(); 
if(!fj){ 

if (mktemp(jname) < 0) { 

fprintf(stderr, "%s: can't mktemp() %s\n", prog, jname); 
cleanup(1); 

} 

if ((fj = fopenCname, "w")) == 0) { 

fprintf(stderr, "%s: can't write %s\n", prog, jname); 
exit(1); 

} 

} 

(void) fwrite((char *)&dx[ix].jp, sizeof(struct jmp), 1, fj); 
(void) fwrite((char *)&dx[ix].offset, sizeof(dx[ix].offset), 1 , fj); 

} 
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GTGCTCTCCGAGGACAAGCAGGAGGNGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTG 
GAAGTGTGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATG 
CGCTCACTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGAC 
TTGAGTCCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGC 
TTCAGTGCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTC 
TTCCTGGGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAAC 
CTCCTGCTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCT 
GTGATCCTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCA 
CAGCTGACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTG 
CTGGTGGGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATC 
CACCTTGGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGC 
TACTACACGTACCGAA 
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CACAACCAGCCACCCCTCTAGGATCCCAGCCCAGCTGGTGCTGGGCTCAGAGGAGAAGGC 
CCCGTGTTGGGAGCACCCTGCTTGCCTGGAGGGACAAGTTTCCGGGAGAGATCAATAAAG 
GAAAGGAAAGAGACAAGGAAGGGAGAGGTCAGGAGAGCGCTTGATTGGAGGAGAAGGGCC 
AGAGA ATG TCGTCCCAGCCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACT 
CCTATGGCAGCTGGTACATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGG 
AAGTGCCCTCCTGCCACACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGC 
TGTCAATCCTTGTGCTGCTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTG 
ACTGTGTGCGTGGCAGGCCCGGCCTGCCCAGGCCCCGGGCAGTGCCTGCTGCTGTTTTCA 
TGGTCCTCCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTGCCCTTCCTGA 
CTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGGGCCTGGAAGA 
TACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGTGCCACGGCTG 
GCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTTGGGGTCCAGG 
TCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTACTCCCTGCTGG 
CCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCTGTGCAGCTGG 
TGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGCAGCTACTCTG 
AGGAATATCTGAGGAACCTCCTTTGCAGGAAGAAGCTGGGAAGCAGCTACCACACCTCCA 
AGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTACACTCCACAGC 
CAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGGACGGCCATTT 
ACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAGGTGAGGGCAG 
GGGTCACCAGGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTCTCCGAGGACA 
AGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTGTGCTACATCT 
CAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCACTGGTGACAC 
ACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGTCCCTTGCATC 
GGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGTGCCTACCAGA 
CAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTGGGAACCACGG 
CCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTGCTCTTCCGTT 
CCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATCCTGCAGAACA 
TGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTGACCAACCGGC 
GAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTGGGTGCCATAG 
TGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATCCACCTTGGCCAGATGG 
ACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTACACGTACCGAA 
ACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTCTGCTCCCTGC 
TCCTGCAAGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGACAGCCTCAGAC 
CAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATGGCCAAGGGAG 
CTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACGCTGCTGCACA 
ACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGTGCCCAGCCCT 
GAGGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCCTGCCTACGAG 
CTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCAGCAGGTCCTCC 
GGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAGGGCTCTGCTCC 
ACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAGAAACTGGTGGGTTAGGG 
CCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTCCCTACCCTGGC 
TCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACTCCAGCCCAGCT 
CCACCTCAGCCTTGGCCTTCACGCTGTGGTVAGCAGCCAAGGCACTTCCTCACCCCCTCAG 
CGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGGCCTCTGGCCTGCAGGGCAG 
CCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGAGAGCCAGATAT 
TTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTCCCTGCAATAAA 
CTTGTTCCTGAGAAAAA 
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MSSQPAGNQTSPGATEDYSYGSWYIDEPQGGEELQPEGEVPSCHTSIPPGLYHACLASL 
SILVLLLLAMLVRRRQLWPDCVRGRPGLPRPRAVPAAVFMVLLSSLCLLLPDEDALPFL 
TLASAPSQDGKTEAPRGAWKILGLFYY7VALYYPLAACATAGHTAAHLLGSTLSWAHLGV 
QVWQRAECPQVPKIYKYYSLLASLPLLLGLGFLSLWYPVQLVRSFSRRTGAGSKGLQSS 
YSEEYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCIYTPQPGFHLPLKLVLSATLTG 
TAI YQVALLLLVGVVPTIQKVRAGVTTDVSYLLAGFGIVLSEDKQEVVELVKHHLWALE 
VCYISALVLSCLLTFLVLMRSLVTHRTNLRALHRGAALDLSPLHRSPHPSRQAI FCWMS 
FSAYQTAFICLGLLVQQII FFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLAL 
AVILQNMAAHWVFLETHDGHPQLTNRRVLYAAT FLLFPLNVLVGAI VATWRVLLSALYN 
AIHLGQMDLSLLPPRAATLDPGYYTYRNFLKIEVSQSHPAMTAFCSLLLQAQSLLPRTM 
AAPQDSLRPGEEDEGMQLLQTKDSMAKGARPGASRGRARWGLAYTLLHNPTLQVFRKTA 
LLGANGAQP 

Important features of the protein: 

Signal peptide: 

none 

Transmembrane domain: 

54-71 

93-111 

140-157 

197-214 

291-312 

356-371 

425-444 

464-481 

505-522 

Motif name: N-glycosylation site. 
8-12 

Motif name: N-myristoylation site. 

50-56 
167-173 
232-238 
308-314 
332-338 
516-522 
618-624 
622-628 
631-637 
652-658 

Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 
355-366 



Motif name: ATP/GTP-binding site motif A (P-loop). 
123-131 
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Stra6 RNA Expression in Human Colon Tumor Tissue 
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Stra6 RNA Expression in Human Colon Tumor Tissue 
vs Normal Mucosa From the Same Patient 

Taqman Product Analysis After 40 Cycles 
T T T T T T T T T T T T T T 

stra6 ^IQHIHBHSHK^KBiii^^HDHIHDEI 

N N N N N N N N N N N N N N 

Tumor # 850 851 892 869 893 870 871 848 872 778 17 64 76 18 
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Stra6 RNA Expression in Human Colon 
Carcinoma Cells + / - Retinoic Acid 

TM #75 (2/28/00) VD3 - Vitamin D3 (1|iM); 
ATRA - All-Trans-Retinoic Acid (1^iM); 9cRA - 9-Cis-Retinoic Acid (1[iM) 
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